During the early months of the COVID-19 pandemic, home internet access became a major issue for the Charlotte Mecklenburg School System (CMS) as its students transitioned to online learning. (https://www.wbtv.com/2020/08/22/cms-foundation-aims-provide-internet-service-students-without-home-access/). Some students, especially those zoned for particular schools, did not have reliable internet access and CMS was forced to scramble to secure additional internet hot-spots. With this issue in mind, this project investigates the spatial distribution of internet access in homes throughout Mecklenburg County. This analysis uses data regarding the presence of household internet subscriptions linked to census block groups within the county.
Household internet access data was downloaded from the US Census website at https://data.census.gov/table?t=Telephone,+Computer,+and+Internet+Access&g=0500000US37119$1500000&tid=ACSDT5Y2020.B28002. This data provides estimates of households within block groups that had an internet subscription in 2020. I did not include the data for those with internet that do not pay for an internet subscription to exclude the households that may receive hot-spots from CMS and other welfare programs. The shapefile for the block groups within North Carolina was downloaded from https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2020&layergroup=Block+Groups. To analyze the data, It had to be cleaned in several ways. This included changing the column titles to the words in the first row and deleting that row so that it only included data. I also had to delete a reoccurring sequence of numbers in front of the data in the csv GEOID column to allow for joining. I removed problematic symbols from the column titles, also changing the name of the one that I was targeting for simplicity. Additionally, I had to convert the values under the target column to numeric data. For the shapefile, I selected the data with Mecklenburg County’s specific county code to narrow the extent to the proper region. All of this was completed in the R code below.
### Loads all necessary libraries required for data preparation and analysis
library(tidyverse)
library(sf)
library(tmap)
library(knitr)
library(kableExtra)
library(spdep)
library(spatstat)
library(maptools)
library(tmap)
# Reads csv file of internet data and assigns it an object name
meck_int <- read_csv("../Data/ACSDT5Y2020.B28002-Data.csv")
# Reads North Carolina block groups shapefile and assigns it an object name
nc_bg <- st_read("../Data/tl_2020_37_bg.shp", quiet = TRUE)
# Confirms that shapefile geometry is valid, fixing issues in the data used to plot
nc_bg <- st_make_valid(nc_bg)
# Creates copy of the original csv to edit
internet_c <- meck_int
# Changes column names to the names in first row
colnames(internet_c) <- meck_int[1,]
# Removes the first row.
internet_c <- internet_c[-1,]
# Removes "1500000US" from values in Geography column to get uniform ID for joining
internet_c$Geography <- str_replace_all(internet_c$Geography, "1500000US", "")
# Removes problematic symbols/constants from column names
colnames(internet_c) <- gsub(":", "", colnames(internet_c))
colnames(internet_c) <- gsub("!!", "", colnames(internet_c))
# Simplifies target column name
names(internet_c)[which(names(internet_c) == "EstimateTotalWith an Internet subscription")] <- "TotalWithInternet"
# Makes target column's data numeric values
internet_c$EstimateTotal <- as.numeric(internet_c$EstimateTotal)
internet_c$TotalWithInternet <- as.numeric(internet_c$TotalWithInternet)
# Selects Mecklenburg County in shapefile
nc_bg <- nc_bg[nc_bg$COUNTYFP == "119",]
# Finds % of households with internet
internet_c$percent_internet <- 100 * internet_c$TotalWithInternet / internet_c$EstimateTotal
# Merges internet data with shapefile of block groups in Mecklenburg County
meck_int_sf <- merge(nc_bg, internet_c, by.x = "GEOID", by.y = "Geography")# Creates summary table that calculates and labels statistical measurements
meck_summary <- tibble(Measure = c("Observations",
"NA Values",
"Minimum (%)",
"Maximum (%)",
"Mean (%)",
"Standard Deviation"),
`Rate of Household Internet Access`= c(sum(nrow(meck_int_sf)),
sum(is.na(meck_int_sf$percent_internet)),
min(meck_int_sf$percent_internet, na.rm = TRUE),
max(meck_int_sf$percent_internet, na.rm = TRUE),
mean(meck_int_sf$percent_internet, na.rm = TRUE),
sd(meck_int_sf$percent_internet, na.rm = TRUE)
)
)
## Formats and prints table with statistics described in the code above
kable(meck_summary,
digits = 1,
format.args = list(big.mark = ",",
scientific = FALSE,
drop0trailing = TRUE),
caption = "Summary of Mecklenburg County Block Group Household Internet Subscriptions") %>%
kable_styling(bootstrap_options = c("striped",
"hover",
"condensed",
"responsive"),
full_width = F)| Measure | Rate of Household Internet Access |
|---|---|
| Observations | 624 |
| NA Values | 4 |
| Minimum (%) | 34.7 |
| Maximum (%) | 100 |
| Mean (%) | 89.3 |
| Standard Deviation | 11.7 |
### Prints histogram of the number of block groups associated with ranges of
### household internet subscription presence
ggplot(meck_int_sf,
aes(x = percent_internet)) +
geom_histogram(binwidth = 2) +
xlab("Household Internet Subscription Presence (%)") +
ggtitle("Mecklenburg County Block Group Household Internet Subscription Rate") +
theme_minimal()There are 624 observations of block group household internet subscription rates in Mecklenburg County, and only 4 of those provide no usable data. The maximum value of internet subscription rate in a block group is 100%, while the minimum is 34.7%. The histogram shows a left skew, many block groups have a household internet subscription estimate greater than 90%, a range that also includes the mode. The peak drops sharply after 90%, but the tail stretches out towards the low of 34.7%. While no individual bin percent represents a high count down this tail (below 90%), the tail still makes up a significant portion of the data and pulls the mean below 90%.
### Sets to view mode to allow for interaction
tmap_mode("view")
### Creates new object of choropleth map with the percent_internet column
data_map <- tm_shape(meck_int_sf) +
tm_polygons("percent_internet",
style = "jenks", ### Sets data classification style
palette = "RdYlGn",) + ### Sets color palette
tm_basemap(server="OpenStreetMap") ### Changes default basemap background
### Creates similar map as above, but sets alpha to low number to increase transparency
### of the choropleth colors, also removes the legend
bg_map <- tm_shape(meck_int_sf) +
tm_polygons("percent_internet",
style = "jenks",
palette = "RdYlGn",
legend.show = FALSE,
alpha = 0.1) +
tm_basemap(server="OpenStreetMap")
### Creates two panels depicting maps above, also syncing them.
### Map on right intended to support analysis of spatial position in relation
### to parts of Mecklenburg County. For example, it can now be seen that one
### of the null value block groups is the Charlotte Douglas International Airport
tmap_arrange(data_map, bg_map, sync = TRUE)The spatial distribution appears to include significant clusters and outliers. The area around Charlotte is particularly interesting. Using the map on the right to locate Charlotte, it can be seen from the map on the left that the blocks making up the heart of Downtown Charlotte have household internet subscription rates somewhere around 90%, as do many of the blocks south of Downtown. However, a potential cluster of low internet rates surrounds these blocks, forming a nearly complete perimeter. This cluster is particularly strong north of Downtown. The southern portion of the county, between Pineville and Weddington, appears to average high internet rates.
# Moran's I
### Removes null values to allow for a Moran's Test
rate_neighbors <- meck_int_sf[which(!is.na(meck_int_sf$percent_internet)), ]
### Creates Queen case neighbors
rate_queen <- poly2nb(rate_neighbors,
queen = TRUE)
### Converts neighbor object to weight matrix
rate_weight <- nb2listw(rate_queen,
style = "B",
zero.policy = TRUE)
### Moran's I test for spatial neighbor correlation
meck_moran <- moran.test(rate_neighbors$percent_internet, ### Selects target data
rate_weight, ### Selects weights
randomisation = TRUE,
zero.policy = TRUE)
### Prints result of test
meck_moran##
## Moran I test under randomisation
##
## data: rate_neighbors$percent_internet
## weights: rate_weight
##
## Moran I statistic standard deviate = 18.449, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.4134833454 -0.0016155089 0.0005062393
# LISA
###
rate_lisa <- localmoran(rate_neighbors$percent_internet,
rate_weight,
zero.policy = TRUE) %>%
as.data.frame()
### Calculate deviation from mean for values
dev <- rate_neighbors$percent_internet - mean(rate_neighbors$percent_internet)
### Find neighbor lag values
lag_n <- lag.listw(rate_weight, rate_neighbors$percent_internet)
### Calculate lag values' deviation from mean
lag_dev <- lag_n - mean(lag_n, na.rm = TRUE)
### Add column to hold future output
rate_lisa$Cat <- rep("0", nrow(rate_neighbors))
### Adds labels based on values
rate_lisa$Cat[which(dev > 0 & lag_dev > 0 & rate_lisa[,5] < 0.05)] <- "HH"
rate_lisa$Cat[which(dev < 0 & lag_dev < 0 & rate_lisa[,5] < 0.05)] <- "LL"
rate_lisa$Cat[which(dev < 0 & lag_dev > 0 & rate_lisa[,5] < 0.05)] <- "LH"
rate_lisa$Cat[which(dev > 0 & lag_dev < 0 & rate_lisa[,5] < 0.05)] <- "HL"
### Prints table summary
table(rate_lisa$Cat)##
## 0 HH HL LH LL
## 530 9 5 21 55
### Adds column of LISA test results to spatial data table
rate_neighbors$LISACAT <- rate_lisa$Cat
### Maps LISA results
lisa_map <- tm_shape(rate_neighbors) +
tm_polygons("LISACAT",
style = "cat",
palette = c("grey", # 0 (not significant)
"blue", # HH
"lightblue", # HL
"pink", # LH
"red"), # LL
border.col = "Black",
border.alpha = 0.25,) +
tm_layout(legend.outside = TRUE) +
tm_basemap(server="OpenStreetMap")
tmap_arrange(data_map, lisa_map, sync = TRUE)The Moran’s Statistic is .41348, this suggests that the spatial pattern of the internet subscription data is moderately correlated in the county. However, the block groups around Charlotte account for most of this correlation. The LISA test finds that 530 block groups are not significantly clustered with neighbors. However, there are 9 values clustered in areas of higher subscription rates, and 55 low values clustered in low areas. There are 5 high outliers in areas with low rates, and 21 low outliers in areas with high rates. It is important to note that low values make up considerably more outliers and clusters than the high values. It is also interesting how the majority of the observed trends and outliers are concentrated in Charlotte’s periphery, with very few interesting spatial patterns being seen elsewhere in the county aside from some high clusters in the south.
The estimated rate of internet subscription presence in households within Mecklenburg County block groups is spatially correlated. Most of the interesting observations are highly concentrated in areas of Downtown Charlotte’s northern periphery, with most clusters and outliers consisting of low rates of internet. The rest of the county is mostly composed of block groups closer to the mean and without any significant neighbor relationships.
| Method | koRpus | stringi |
|---|---|---|
| Word count | 735 | 756 |
| Character count | 4645 | 4928 |
| Sentence count | 38 | Not available |
| Reading time | 3.7 minutes | 3.8 minutes |